Finding words in alphabet soup: Inference on freeform character recognition for historical scripts

نویسندگان

  • Nicholas R. Howe
  • Shaolei Feng
  • R. Manmatha
چکیده

This paper develops word recognition methods for historical handwritten cursive and printed documents. It employs a powerful segmentation-free letter detection method based upon joint boosting with histogram-of-gradients features. Efficient inference on an ensemble of hidden Markov models can select the most probable sequence of candidate character detections to recognize complete words in ambiguous handwritten text, drawing on character n-gram and physical separation models. Experiments with two corpora of handwritten historic documents show that this approach recognizes known words more accurately than previous efforts, and can also recognize out-of-vocabulary words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hidden Markov Model for Alphabet-Soup Word Recognition

Recent work on the “alphabet soup” paradigm has demonstrated effective segmentation-free character-based recognition of cursive handwritten historical text documents. The approach first uses a joint boosting technique to detect potential characters the alphabet soup. A second stage uses a dynamic programming algorithm to recover the correct sequence of characters. Despite experimental success, ...

متن کامل

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...

متن کامل

New Approaches for Cursive Languages Recognition: Machine and Hand Written Scripts and Texts

Three different approaches are considered in this paper to deal with the methods of Pattern Classification and Recognition. The main patterns considered are images representing the alphabet of cursive-scripts languages, particularly Arabic alphabet. The practical results of written scripts recognition led to the possibility of applying the main ideas and criteria to written and spoken texts and...

متن کامل

A New Approach for Hindi Optical Character Recognition Based On Neural Networks

Assistant Professor, HIET, Kaithal (Haryana) E-mail: [email protected] 2 Assistant Professor, NIT, Kurukshetra (Haryana) E-mail: [email protected], Assistant Professor, HIET, Kaithal (Haryana) E-mail: [email protected] Assistant Professor, HCTM, Kaithal (Haryana) E-mail: [email protected] Abstract —OCR is the acronym for Optical Character Recognition. This technology allows a machine...

متن کامل

OCR for printed Kannada text to Machine editable format using Database approach

This paper describes an Optical Character Recognition (OCR) system for printed text documents in Kannada, a South Indian language. The proposed OCR system for the recognition of printed Kannada text, which can handle all types of Kannada characters. The system first extracts image of Kannada scripts, then from the image to line segmentation then segments the words into sub-character level piece...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2009